Skip to content

Conversation

@mdvoretc-intel
Copy link
Contributor

Details:

  • The change allows parameters to be recognized alongside constants as valid weight inputs for transformations producing FullyConnectedCompressed nodes

Description of the issue:

At present, the FC_COMPRESSED_WEIGHT_PATTERN macro contains a pattern for dequantization of a constant integer weight. This pattern is used to recognize and fold cases where fused weight dequantization can be used, replacing them with FullyConnectedCompressed nodes. Due to expecting a constant weight input, this pattern fails to recognize quantized LoRA weights, which are provided as parameters:
fc_compressed_param_before
With the changes in this patch, these weights can be recognized, and the transformations can proceed and produce nodes that would then leverage oneDNN fused QGEMM for execution:
fc_compressed_param_after

Tickets:

@github-actions github-actions bot added category: GPU OpenVINO GPU plugin category: transformations OpenVINO Runtime library - Transformations labels Oct 2, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Oct 2, 2025
@mdvoretc-intel
Copy link
Contributor Author

build_jenkins

@mdvoretc-intel mdvoretc-intel marked this pull request as ready for review October 29, 2025 11:36
@mdvoretc-intel mdvoretc-intel requested review from a team as code owners October 29, 2025 11:36
@mdvoretc-intel mdvoretc-intel requested review from CuriousPanCake and removed request for a team October 29, 2025 11:36
@mdvoretc-intel
Copy link
Contributor Author

build_jenkins

@mdvoretc-intel mdvoretc-intel force-pushed the param_quant_weight branch 2 times, most recently from b82b902 to 476f80f Compare October 30, 2025 09:25
Copy link
Contributor

@mklimenk mklimenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two branches of the if (pattern_map.count(weights_const_m)) { condition share a lot of similarities, please consider refactoring it to avoid code duplication

This change enables use of quantized LoRA weights, passed as parameters during
execution, to be recognized by the transformaions that produce
FullyConnectedCompressed nodes for QGEMM execution.
The test previously expected the transformation to fail due to the use of input2
as a weight. The new logic allows use of parameters as weights, so the test has
been adjusted to expect a successful transformation.
Copy link
Contributor

@mklimenk mklimenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner now, thanks!

@mdvoretc-intel
Copy link
Contributor Author

@CuriousPanCake please review.

@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Nov 11, 2025
@mdvoretc-intel
Copy link
Contributor Author

@CuriousPanCake please review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin ExternalIntelPR External contributor from Intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants